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8.1 Introduction 

Twenty-five years ago in the early 1980s, morphological analysis of natural 
language was a challenge to computational linguists. Simple cut-and-paste 
programs could be and were written to analyze strings in particular languages, 
but there was no general language-independent method available. Further- 
more, cut-and-paste programs for analysis were not reversible, they could not 
be used to generate words. Generative phonologists of that time described 
morphological alternations by means of ordered rewrite rules, but it was not 
understood how such rules could be used for analysis. 

This was the situation in the spring of 1981 when Kimmo Koskenniemi 
came to a conference on parsing that Lauri Karttunen had organized at the 
University of Texas at Austin. Also at the same conference were two Xerox 
researchers from Palo Alto, Ronald M. Kaplan and Martin Kay. The four 
Ks discovered that all of them were interested and had been working on the 
problem of morphological analysis. Koskenniemi went on to Palo Alto to visit 
Kay and Kaplan at Xerox PARC. 

This was the beginning of Two-Level Morphology, the first general model 
in the history of computational linguistics for the analysis and generation of 
morphologically complex languages. The language-specific components, the 
lexicon and the rules, were combined with a runtime engine applicable to all 
languages. 
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8.2 The Origins 

Traditional phonological grammars, formalized by Chomsky and Halle 
(1968), consisted of an ordered sequence of rewrite rules that converted 
abstract phonological representations into surface forms through a series 
of intermediate representations. Such rewrite rules have the general form 
a -* ft I 7 _d where a, ft, 7, and 6 can be arbitrarily complex strings or 
feature-matrices. The rule is read "a is rewritten as ft between 7 and 6". In 
mathematical linguistics (Partee et al. 1 993), such rules are called CONTEXT- 
SENSITIVE REWRITE RULES, and they are more powerful than regular ex- 
pressions or context-free rewrite rules. 

In 1 972, C. Douglas Johnson published his dissertation, Formal Aspects of 
Phonological Description, wherein he showed that phonological rewrite rules 
are actually much less powerful than the notation suggests. Johnson observed 
, that while the same context-sensitive rule could be applied several times re- 
cursively to its own output, phonologists have always assumed implicitly that 
the site of application moves to the right or to the left in the string after each 
application. For example, if the rule a ~* ft / 7 _ 6 is used to rewrite the 
string -taS as 7/3(5, any subsequent application of the same rule must leave 
the ft part unchanged, affecting only 7 or 6. Johnson demonstrated that the 
effect of this constraint is that the pairs of inputs and outputs produced by 
a phonological rewrite rule can be modeled by a finite-state transducer. This 
result was largely overlooked at the time and was rediscovered by Ronald 
M. Kaplan and Martin Kay around 1980 . Putting things into a more algebraic 
perspective than Johnson, Kaplan and Kay showed that phonological rewrite 
rules describe regular relations. By definition, a regular relation can be 
represented by a finite-state transducer. 

Johnson was already aware of an important mathematical property of 
finite-state transducers established by Schutzenberger(l961): there exists, for 
any pair of transducers applied sequentially, an equivalent single transducer. 
Any cascade of rule transducers can in principle be composed into a single 
transducer that maps lexical forms directly into the corresponding surface 
forms, and vice versa, without any intermediate representations. 

These theoretical insights did not immediately lead to practical results. The 
development of a compiler for rewrite rules turned out to be a very complex 
task. It became clear that building a compiler required as a first step a com- 
plete implementation of basic finite-state operations such as union, intersec- 
tion, complementation, and composition. Developing a complete finite-state 
calculus was a challenge in itself on the computers that were available at the 
time. 

Another reason for the slow progress may have been that there were per- 
sistent doubts about the practicality of the approach for morphological anal- 
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YSIS. Traditional phonological rewrite rules describe the correspondence be- 
tween lexical forms and surface forms as a one-directional, sequential map- 
ping from lexical forms to surface forms. Even if it was possible to model 
the generation of surface forms efficiently by means of finite-state trans- 
ducers, it was not evident that it would lead to an efficient analysis procedure 
going in the reverse direction, from surface forms to lexical forms. 

Let us consider a simple illustration of the problem with two sequentially 
applied rewrite rules, N -> m / _ p and p ->m/m _. The cor- 
responding transducers map the lexical form kaNpal unambiguously to kam- 
mat, with kampat as the intermediate representation. However if we apply the 
same transducers in the other direction to the input kammau we get the three 
results shown in Figure 1. 




FIGURE I Deterministic Generation. Nondeterministie Analysis 

This asymmetry is an inherent property of the generative approach to 
phonological description. If all the rules are deterministic and obligatory and 
if the order of the rules is fixed, each lexical form generates only one surface 
form. But a surface form can typically be generated in more than one way, 
and the number of possible analyses grows with the number of rules that are 
involved. Some of the analyses may turn out to be invalid because the pu- 
tative lexical forms, say kammat and kampat in this case, might not exist in 
the language. But in order to look them up in the lexicon, the system must 
first complete the analysis. Depending on the number of rules involved, a sur- 
face form could easily have dozens of potential lexical forms, even an infinite 
number in the case of certain deletion rules. 

Although the generation problem had been solved by Johnson, Kaplan and 
Kay, at least in principle, the problem of efficient morphological analysis in 
the Chomsky-Halle paradigm was still seen as □ formidable challenge. As 
counterintuitive as it was, it appeared that analysis was computationally a 
much more difficult task than generation. Composing all the rule transducers 
into a single one would not solve the "overanalysis" problem. Because the 
resulting single transducer is equivalent to the original cascade, the ambiguity 
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remains. 

The solution to the overanalysis problem should have been obvious: to for- 
malize the lexicon itself as a finite state transducer and compose the lexicon 
with the rules. In this way, all the spurious ambiguities produced by the rules 
are eliminated at compile time. The resulting single transducer contains only 
lexical forms that actually exist in the language. When this idea first surfaced 
in Karttunen et al. (1992), it was not in connection with traditional rewrite 
rules but with an entirely different finite-state formalism that had been intro- 
duced in the meantime, called two-level rules (Koskenniemi 1983). 

8.3 Two-level Morphology 

In the spring of 1 98 1 when Kimmo Koskenniemi came to the USA for a visit, 
he learned about Kaplan and Kay's finite-state discovery. 1 parc had begun 
work on the finite-state algorithms, but they would prove to be many years 
in the making. Koskenniemi was not convinced that efficient morphologi- 
cal analysis would ever be practical with generative rules, even if they were 
compiled into finite-state transducers. Some other way to use finite automata 
might be more efficient. 

Back in Finland, Koskenniemi invented a new way to describe phonolog- 
ical alternations in finite-state terms. Instead of cascaded rules with interme- 
diate stages and the computational problems they seemed to lead to, rules 
could be thought of as statements that directly constrain the surface realiza- 
tion of lexical strings. The rules would not be applied sequentially but in 
parallel. Each rule would constrain a certain lexical/surface correspondence 
and the environment in which the correspondence was allowed, required, or 
prohibited. For his 1983 dissertation, Koskenniemi constructed an ingenious 
implementation of his constraint-based model that did not' depend on a rule 
compiler, composition or any other finite-state algorithm, and he called it 
two-level morphology. Two-level morphology is based on three ideas: 

• Rules are symbol-to-symbol constraints that are applied in parallel, not 
sequentially like rewrite rules. 

• The constraints can refer to the lexical context, to the surface context, or 
to both contexts at the same time. 

« Lexical lookup and morphological analysis are performed in tandem. 

To illustrate the first two principles we can turn back to the kaNpal exam- 
ple again. A two-level description of the lexical-surface relation is sketched 
in Figure 2. 

As the lines indicate, each symbol in the lexical string kaNpal is paired with 
its realization in the surface string kammat. Two of the symbol pairs in Fig- 



'They weren 't then aware of Johnson's 1972 publication. 
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FIGURE 2 Example of Two-Level Constraints 

lire 2 are constrained by the context marked by the associated box. The N:m 
pair is restricted to the environment having an immediately following p on 
the lexical side. In fact the constraint is tighter. In this context, all other 
possible realizations of a lexical N are prohibited. Similarly, the p:m pair 
requires the preceding surface m, and no other realization of p is allowed 
here. The two constraints are independent of each other. Acting in parallel, 
they have the same effect as the cascade of the two rewrite rules in Figure 1 . 
In Koskenniemi's notation, these rules are written as N:m < = > _ p : and 
p:m < = > :m _ , where < = > is an operator that combines a context re- 
striction with the prohibition of any other realization for the lexical symbol 
of the pair. The colon in the right context of first rule, p : , indicates that it 
refers to a lexical symbol; the colon in the left context of the second rule, : m, 
indicates a surface symbol. 

Two-level rules may refer to both sides of the context at the same time. 
The y~ie alternation in English plural nouns could be described by two rules: 
one realizes y as / in front of an epenthetic e\ the other inserts an epenthetic 
e between a lexical consonant-v sequence and a morpheme boundary (+) that 
is followed by an s. Figure 3 illustrates the v. / and 0:e constraints. 



FIGURE 3 A Two-Level View of y~/e Alternation in English 

Note that the e in Figure 3 is paired with a 0 (= zero) on the lexical level. In 
two-level rules, zero is a symbol like any other; it can be used to constrain the 
realization of other symbols, as in y : i < = > _ 0 : e. In fact, all the other 
rules must "know" where zeros may occur. Zeros are treated as epsilons only 
when two-level rules are applied to strings. 

Like rewrite rules, two-level rules describe regular relations; but there is an 
important difference. Because the zeros in two-level rules are ordinary sym- 
bols, a two-level rule represents an EQUAL-LENGTH relation. This has an 
important consequence; Although regular relations in general are not closed 
under intersection, equal length relations have that property. When a set of 
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two-level transducers are applied in parallel, the apply routine in fact simu- 
lates the intersection of the rule automata and composes the input string with 
the virtual constraint network. 

Applying the rules in parallel does not in itself solve the overanalysis prob- 
lem discussed in the previous section. The two constraints sketched above 
allow kammat to be analyzed as kaMpat, kampat, or kammat. However, the 
problem becomes manageable when there are no intermediate levels of anal- 
ysis. In Koskenniemi's 1983 system, the lexicon was represented as a forest 
of tries (= letter trees), tied together by continuation-class links from leaves 
of one tree to roots of another tree or trees. 2 Lexical lookup and the analysis 
of the surface form are performed in tandem. In order to arrive at the point 
shown in Figure 4, the analyzer has traversed a branch in the lexicon that 



contains the lexical string kaN. At this point, it only considers symbol pairs 
whose lexical side matches one of the outgoing arcs of the current state. It 
does not pursue analyses that have no matching lexical path. 

Koskenniemi's two-level morphology was the first practical general model 
in the history of computational linguistics for the analysis of morphologi- 
cally complex languages. The language-specific components, the rules and 
the lexicon, were combined with a universal runtime engine applicable to all 
languages. 

8.4 A Two-Level Rule Compiler 

In his dissertation, Koskenniemi introduced a formalism for two-level rules. 
The semantics of two-level rules was well-defined but there was no rule com- 
piler available at the time. Koskenniemi and other early practitioners of two- 
level morphology constructed their rule automata by hand. This is tedious in 
the extreme and very difficult for all but very simple rules. 

Although two-level rules are formally quite different from the rewrite rules 
studied by Kaplan and Kay, the methods that had been developed for com- 

2 The TEXFIN analyzer developed al the University of Texas at Austin (Karttunen et al. 1981) 




figure 4 Following a Path in the Lexicon 
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piling rewrite rules were applicable to two-level rules as well. In both for- 
malisms, the most difficult case is a rule where the symbol that is replaced or 
constrained appears also in the context pan of the rule. This problem Kaplan 
and Kay had already solved by an ingenious technique for introducing and 
then eliminating auxiliary symbols to mark context boundaries. Another fun- 
damental insight they had was the encoding of context restrictions in terms 
of double negation. For example, a constraint such as u p must be followed 
by q" can be expressed as "it is not the case that something ending in p 
is not followed by something starting with q." In Koskenniemi's formalism, 

p => _ q- 

In the summer of 1985, when Koskenniemi was a visitor at Stanford, 
Kaplan and Koskenniemi worked out the basic compilation algorithm for 
two-level rules. The first two-level rule compiler was written in InterLisp by 
Koskenniemi and Karttunen in 1985-87 using Kaplan's implementation of the 
finite-state calculus (Koskenniemi 1986, Karttunen et al. 1987). The current 
C-version of the compiler, called twolc, was written at PARC in 1991-92 
(Karttunen and Beesley 1992). 3 

Although the basic compilation problem was solved quickly, building a 
practical compiler for two-level rules took a long time. The TWOLC com- 
piler includes sophisticated techniques for checking and resolving conflicts 
between rules whenever possible. Without these features, large rule systems 
would have been impossible to construct and debug. If two constraints are in 
conflict, some lexical forms have no valid surface form. This is a common 
problem and often difficult to remedy even if the compiler is able to detect 
the situation and to pinpoint the cause. 

8.5 Two-Level Implementations 

Koskenniemi's Pascal implementation was quickly followed by others. The 
most influential of them was the kimmo system by Lauri Karttunen and his 
students at the University of Texas (Karttunen 1983, Gajek et al. 1983). This 
Lisp project inspired many copies and variations, including those by Beesley 
(1989, 1990). A free C implementation of classic Two-Level Morphol- 
ogy, called pc-KIMMO, from the Summer Institute of Linguistics (Antworth 
1 990), became a popular tool. 

In Europe, two-level morphological analyzers became a standard com- 

3 The landmark 1994 article by Kaplan and Kay on the mathematical foundations of finite- 
state linguistics defines the basic compilation algorithm for phonological rewrite rules and for 
Koskenniemi's two-level rules. The article appeared years after the work on the two-level corn- 
current parc/xrce regular expression compiler. The article is accurate on the former topic, but 
the algorithm for replace rules (Karttunen 1995. 1996. Kempe and Karttunen 1996) differs in 
many details from the compilation of rewrite rules as described by Kaplan and Kay. 
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ponent in several large systems for natural language processing such as the 
British Alvey project (Black et al. 1987,Kitchie et al. 1987, 1992), sri's CLE 
Core Language Engine (Carter 1995), the ALEP Natural Language Engineer- 
ing Platform (Pulman 1991) and the MULTEXT project (Armstrong 1996). 
ALEP and multext were funded by the European Commission. 4 

Some of these systems were based on simplified two-level rules, the so- 
called partition-based formalism Ruessink (1989), which was claimed to 
be easier for linguists to learn than the original Koskenniemi notation. But 
none of these systems had a finite-state rule compiler. 5 Another difference 
was that morphological parsing could be constrained by feature unification. 
Because the rules were interpreted at runtime and because of the unifica- 
tion overhead, these systems were not efficient, and two-level morphology 
acquired, undeservedly, a reputation for being slow. 

At XRCE and Inxight, the TWOLC compiler was used in the 1990s to de- 
velop comprehensive morphological analyzer for numerous languages. An- 
other utility, called lexc (Karttunen 1993b), made it possible to combine a 
finite-state lexicon with a set of two-level rules into a single lexical trans- 
ducer using a special "intersecting composition" algorithm that simulates 
the intersection of the rules while simultaneously composing the virtual rule 
transducer with the lexicon. A lexical transducer can be considered the ulti- 
mate two-level model of a language as it encodes compactly: 

• all the lemmas (canonical lexical forms with morphological tags) 
, • all the inflected surface forms 

• all the mappings between lexical forms and surface forms. 

In the course of this work it became evident that lexical transducers are easier 
to construct with sequentially applied replace rules than with two-level rules. 
Large systems of two-level rules are notoriously difficult to debug. Most de- 
velopers of morphological analyzers at XRCE and at companies such as Inx- 
ight have over the years switched to the sequential model and the XFST tool 
that includes a compiler for replace rules. The ordering of replace rules seems 
to be less of a problem than the mental discipline required to avoid rule con- 
flicts in a two-level system, even if the compiler automatically resolves most 
of them. From a formal point of view there is no substantive difference; a 
cascade of rewrite rules and a set of parallel two-level constraints are just two 
different ways to decompose a complex regular relation into a set of simpler 
relations that are easier to understand and manipulate. 

4 The MULTEXT morphology tool (Petitpierrc and Russel 1995) built at ISSCO is available at 
ht tp : / /packages . debian . org/ stable/misc/mmorph . html 

5 A compilation algorithm has been developed for the partition-based formalism Grimley- 
Evans et al. (I99fi), but to our knowledge there is no publicly available implementation. 
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The Beesley and Karttuncn (2003) book Finite Stale Morphology de- 
scribes the XFST and LEXC tools and offers a lot of practical advice on tech- 
niques for constructing lexical transducers. 6 

8.6 Reflections 

Although the two-level approach to morphological analysis was quickly ac- 
cepted as a useful practical method, the linguistic insight behind it was not 
picked up by mainstream linguists. The idea of rules as parallel constraints 
between a lexical symbol and its surface counterpart was not taken seriously 
at the time outside the circle of computational linguists. Many arguments had 
been advanced in the literature to show that phonological alternations could 
not be described or explained adequately without sequential rewrite rules. It 
went largely unnoticed that two-level rules could have the same effect as or- 
dered rewrite rules because two-level rules allow the realization of a lexical 
symbol to be constrained either by the lexical side or by the surface side. The 
standard arguments for rule ordering were based on the a priori assumption 
that a rule could refer only to the input context (Karttunen 1 993a). 

But the world has changed. Current phonologists. writing in the frame- 
work of OT (Optimality Theory), are sharply critical of the "serialist" tradition 
of ordered rewrite rules that Johnson, Kaplan and Kay wanted to formalize 
(Prince and Smolensky 1993, Kager 1999, McCarthy 2002). 7 In a nutshell, 
OT is a two-level theory with ranked parallel constraints. Many types of op- 
timality constraints can be represented trivially as two-level rules. In contrast 
to Koskcnniemi's "hard" constraints, optimality constraints are "soft" and vi- 
olable. There are of course many other differences. Most importantly, OT con- 
straints are meant to be universal. The fact that two-level rules can describe 
orthographic idiosyncrasies such as the y~ie alternation in English with no 
appeal to universal principles is a minus rather than a plus. It makes the ap- 
proach uninteresting from the OT point of view. 8 

Nevertheless, from the OT perspective, two-level rules have some inter- 
esting properties. They are symbol-to-symbol constraints, not string-to-string 
relations like general rewrite rules. Two-level rules enable the linguist to re- 
fer to the input and the output context in the same constraint. The notion of 
faithfulness (= no change) can be expressed straight-forwardly. It is pos- 
sible to formulate constraints that constrain directly the surface level. These 
ideas were ten years ahead of their time in 1 983 . 

fi The book includes a CD thai contains TWOLC. XKST, LEXC and other finite-state tools. See 
is included on the CD. 

'The icrm SERIAL, a pejorative term in an OT context, refers to sequential rule application. 
"Finite-state approaches to Optimality Theory have been explored in several recent articles 
(Eisner 1997, Frank and Satta 19.98, Karttunen 1998). 
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It is interesting to observe that computational linguists and "paper-and- 
pencil linguists" have historically been out of sync in their approach to 
phonology and morphology. When computational linguists implemented par- 
allel two-level models in the 1 980s, paper-and-pencil linguists were still stuck 
in the serialist Chomsky-Halle paradigm. When most of the computational 
morphologists working with the Xerox tools embraced the sequential model 
as the more practical approach in the mid 1 990s, a two-level theory took over 
paper-and-pencil linguistics by a storm in the guise of ot. 

If one views the mapping from lexical forms to surface forms as a regular 
relation, the choice between different ways of decomposing it has practical 
consequences but it is not a deep theoretical issue for computational linguists. 
No brand of finite-state morphology has ever been promoted as a theory about 
language. Its practitioners have always been focused on the practical task of 
representing the morphological aspects of a language in a form that supports 
efficient analysis and generation. They have been remarkably successful in 
that task. 

Paper-and-pencil morphologists in general are not interested in creat- 
ing complete descriptions for particular languages. They design formalisms 
for expressing generalizations about morphological phenomena commonly 
found in all natural languages. But if it turns out, as in the case of REALIZA- 
tional morphology (Stump 2001), that the theory can be implemented 
with finite-state tools (Karttunen 2003), perhaps the phenomena are not as 
complex as the linguist has imagined. 
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